Lab 02

Learning objectives

By the end of the lab, you will be able to …

  • A
  • B
  • C

Code-along 02

Download and open code-along-02.qmd

Frequency Distributions

Packages

Load the standard packages.

library(here)
library(tidyverse)
library(gssr)
library(gssrdoc)


Install and load the summarytools package.

install.packages("summarytools")


library(summarytools)

Load your data & codebook

# Get the data only for the 2024 survey respondents
gss24 <- gss_get_yr(2024)

# Load the codebook
data(gss_dict)

Variable Descriptions

Let’s familiarize ourselves with the premarsx and polviews variables.


In the console, type ?premarsx and hit enter. The Help pane will show you the question text and response options and values.


Now, do the same for polviews.

Table

Add a line below to also see a table for the polviews variable.

table(gss24$premarsx)

   1    2    3    4 
 357  122  258 1378 


table(gss24$polviews)

   1    2    3    4    5    6    7 
 140  421  368 1148  381  516  186 

Labels

Use haven::as_factor to see the value labels instead of the value numbers. Then, do the same for polviews.

table(as_factor(gss24$premarsx))

                 always wrong           almost always wrong 
                          357                           122 
         wrong only sometimes              not wrong at all 
                          258                          1378 
                        other                           iap 
                            0                          1126 
                   don't know            I don't have a job 
                           50                             0 
                  dk, na, iap                     no answer 
                            0                             6 
                not imputable                       refused 
                            0                             0 
               skipped on web                    uncodeable 
                           12                             0 
not available in this release    not available in this year 
                            0                             0 
                 see codebook 
                            0 

Labels

table(as_factor(gss24$polviews))

            extremely liberal                       liberal 
                          140                           421 
             slightly liberal  moderate, middle of the road 
                          368                          1148 
        slightly conservative                  conservative 
                          381                           516 
       extremely conservative                    don't know 
                          186                            99 
                          iap            I don't have a job 
                            0                             0 
                  dk, na, iap                     no answer 
                            0                            20 
                not imputable                       refused 
                            0                             0 
               skipped on web                    uncodeable 
                           30                             0 
not available in this release    not available in this year 
                            0                             0 
                 see codebook 
                            0 

Better Labels

Let’s use zap_missing and as_factor to clean these up. Then, do the same for the polviews variable.


gss24$premarsx <- zap_missing(gss24$premarsx)
gss24$premarsx <- as_factor(gss24$premarsx)
table(gss24$premarsx)

        always wrong  almost always wrong wrong only sometimes 
                 357                  122                  258 
    not wrong at all                other 
                1378                    0 

Better Labels

# polviews
gss24$polviews <- zap_missing(gss24$polviews)
gss24$polviews <- as_factor(gss24$polviews)
table(gss24$polviews)

           extremely liberal                      liberal 
                         140                          421 
            slightly liberal moderate, middle of the road 
                         368                         1148 
       slightly conservative                 conservative 
                         381                          516 
      extremely conservative 
                         186 

Better Labels cont.

Let’s get rid of the empty levels in premarsx.

Then, do the same for polviews.

gss24$premarsx <- droplevels(gss24$premarsx)
table(gss24$premarsx)

        always wrong  almost always wrong wrong only sometimes 
                 357                  122                  258 
    not wrong at all 
                1378 

Better Labels cont.

gss24$polviews <- droplevels(gss24$polviews)
table(gss24$polviews)

           extremely liberal                      liberal 
                         140                          421 
            slightly liberal moderate, middle of the road 
                         368                         1148 
       slightly conservative                 conservative 
                         381                          516 
      extremely conservative 
                         186 

Frequency Table

Make a frequency table. One of summarytools main purposes is to help cleaning and preparing data for further analysis. Pay attention to the missing values. Then, do the same for polviews.

freq(gss24$premarsx) 
Frequencies  
gss24$premarsx  
Type: Factor  

                             Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
-------------------------- ------ --------- -------------- --------- --------------
              always wrong    357     16.88          16.88     10.79          10.79
       almost always wrong    122      5.77          22.65      3.69          14.48
      wrong only sometimes    258     12.20          34.85      7.80          22.27
          not wrong at all   1378     65.15         100.00     41.64          63.92
                      <NA>   1194                              36.08         100.00
                     Total   3309    100.00         100.00    100.00         100.00

Frequency Table

freq(gss24$polviews) 
Frequencies  
gss24$polviews  
Type: Factor  

                                     Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
---------------------------------- ------ --------- -------------- --------- --------------
                 extremely liberal    140      4.43           4.43      4.23           4.23
                           liberal    421     13.32          17.75     12.72          16.95
                  slightly liberal    368     11.65          29.40     11.12          28.07
      moderate, middle of the road   1148     36.33          65.73     34.69          62.77
             slightly conservative    381     12.06          77.78     11.51          74.28
                      conservative    516     16.33          94.11     15.59          89.88
            extremely conservative    186      5.89         100.00      5.62          95.50
                              <NA>    149                               4.50         100.00
                             Total   3309    100.00         100.00    100.00         100.00

Pretty Frequency Table

Using report.nas = FALSE suppresses the missing data.
The headings = FALSE parameter suppresses the heading section. Then, do the same for polviews.


freq(gss24$premarsx, report.nas = FALSE, headings = FALSE) 

                             Freq        %   % Cum.
-------------------------- ------ -------- --------
              always wrong    357    16.88    16.88
       almost always wrong    122     5.77    22.65
      wrong only sometimes    258    12.20    34.85
          not wrong at all   1378    65.15   100.00
                     Total   2115   100.00   100.00

Pretty Frequency Table

freq(gss24$polviews, report.nas = FALSE, headings = FALSE) 

                                     Freq        %   % Cum.
---------------------------------- ------ -------- --------
                 extremely liberal    140     4.43     4.43
                           liberal    421    13.32    17.75
                  slightly liberal    368    11.65    29.40
      moderate, middle of the road   1148    36.33    65.73
             slightly conservative    381    12.06    77.78
                      conservative    516    16.33    94.11
            extremely conservative    186     5.89   100.00
                             Total   3160   100.00   100.00

This is a lot of categories. Let’s condense them to make it easier to interpret.